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The development of computer networks today has increased rapidly. This can 
be seen based on the trend of computer users around the world, whereby they 
need to connect their computer to the Internet. This shows that the use of 
Internet networks is very important, whether for work purposes or access to 
social media accounts. However, in widely using this computer network, the 
privacy of computer users is in danger, especially for computer users who do 
not install security systems in their computer. This problem will allow 
hackers to hack and commit network attacks. This is very dangerous, 
especially for Internet users because hackers can steal confidential 
information such as bank login account or social media login account. The 
attacks that can be made include phishing attacks. The goal of this study is to 
review the types of phishing attacks and current methods used in preventing 
them. Based on the literature, the machine learning method is widely used to 
prevent phishing attacks. There are several algorithms that can be used in the 
machine learning method to prevent these attacks. This study focused on an 
algorithm that was thoroughly made and the methods in implementing this 
algorithm are discussed in detail. 
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1. INTRODUCTION 

Network security is an important and critical issue that needs to be considered and emphasized in 
the network, especially in an organization. The authorization process of an account of a person can be 
considered as a network security, where the process commonly uses the username and password which 
inhibit and monitor the unauthorized access to a particular account [1], An authentication process is needed 
to protect sensitive and important data from being exposed and stolen by unauthorized users or hackers. In 
certain situations, although a network security has already been implemented, there are still some chances 
that sensitive and important data can be stolen. One way to steal sensitive and important data can be done 
through a phishing attack. 
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Phishing attack can be considered as an act of imitating genuine websites to collect sensitive 
information from a victim and using them for committing crimes, such as illegal financial gains [2]. This 
attack typically starts when the attacker or hacker sends an email that seems original to the victim and 
persuades them to update and verify their information by clicking on a Uniform Resource Locator (URL) link 
in the email [3], Usually, the phishing email will redirect users to the infected website and ask them to 
provide their information, such as their personal details and bank account information, which will be used by 
the attacker or hacker to steal whatever important information that the users have entered [4]. Phishing attack 
is always related to spam emails received by the victims. Some of those emails may contain the link that will 
redirect the victims to the phishing websites. Phishing attack is usually difficult to identify because the email 
sent by the attacker or hacker looks like a legit email. In addition, the attacker or hacker can hide the location 
of their server and sometimes disguise the URL of the phishing website to work like the legitimate website. 
Moreover, even a good security software is unable to detect phishing websites because they do not depend on 
the malware infection of the computer [5]. 

Currently, many works have been proposed for phishing attack detection in the literature and 
commercial products. There are four features that can be used in detecting a phishing attack. The features are 
given in Figure 1. The URL-based feature works based on URL. A phishing attack works based on a URL 
that redirects a user to a certain page that has been duplicated by the attacker from the official page. The URL 
and the duplicated page can be recognized from a malicious URL. The malicious URL can be detected based 
on the total length of URL, the count digit in URL, the correct spelling of URL, and whether the URL 
includes a legitimate brand name or not. The domain-based feature works by detecting the domain name of 
the URL, where the domain name will determine whether the URL can be classified as a phishing attack or 
not. The URL can be considered as phishing based on the status of the domain name; whether the domain 
name is in the list of blacklists of well-known reputation services, the age of domain name, and the owner of 
the domain name. The third feature, which is page-based works, based on the information from the pages 
where the information will determine the reputation ranking services. The reputation will determine the 
reliability of the pages. Normally, the reputation ranking is determined by the Global page rank. Country 
PageRank, and position indexed by Alexa. Usually, the ranking services will give information regarding the 
user activities in the site including an estimated number of visitors of a page in terms of daily, weekly or 
monthly; the average visit of the page, web traffic, category of the domain, and similar websites with the 
page. Meanwhile, the content-based feature works based on the scanning process of the domain. The items 
being scanned are usually the page title, meta tags, hidden text, text in the body, and images in the page. The 
scanning process is to determine whether the page requires the login process, the category of the page, and 
the user of the page. 



Figure 1. Features in phishing attack detection 

All of the features discussed are widely used in the identification of phishing attacks. In some cases, 
the mentioned features may not be effective to detect the phishing attack due to the limitations of these 
features. Consider a situation where the content-based feature is used to develop a fast mechanism in 
analyzing the phishing of many pages. It will take time to scan a huge number of pages. Hence, the feature 
that will be chosen depends on the objective of the detection mechanism and should be selected carefully. 


2. RESEARCH METHOD 

Detecting phishing attack is a challenging task due to its mechanism where the attack exploits 
human vulnerabilities, not on the system error. Detecting phishing attack can be considered as a classification 
problem, which means that the attack needs to be labeled, whether the page is a phishing attack or legitimate. 
For this purpose, a good method is needed. Using machine learning (ML) methods is appropriate to be 
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applied in phishing attack detection because ML is able to transform the problem in phishing attack detection 
into a classification task. 

ML is a subfield of artificial intelligence where the goal is to enable the computer to learn about 
something on its own or learn from experience. ML works based on the idea that computational method is 
able to learn information directly from data, identify patterns in the observed data, and make decisions 
without relying on a predetermined equation as a model. ML uses a technique where it trains a model on 
known input and output to predict the class of data. This technique seems suitable for the detection of 
phishing attacks because it can convert the detection problem into a classification task. 

In detecting a phishing attack, the ML method will train a classification method with some features 
or rules to declare whether the attack is classified as phishing or not. Usually, the ML method works by 
extracting the features from a URL or the content of a web page and train a prediction model based on the 
features that have been discussion in the previous section before deciding whether the web page is legitimate 
or fake. There are many ML methods in detecting phishing attacks that are currently and widely used, which 
include the Artificial Neural Network (ANN) algorithm. Decision Tree (DT) algorithm, k-means clustering 
algorithm. Naive Bayes (NB) algorithm. Random Lorest (RL) algorithm, and Support Vector Machine 
(SVM) algorithm. These methods were chosen because of their performance and high accuracy in detecting 
phishing attacks, following this is the description of these methods. 

2.1. Artificial Neural Network 

The ANN is a model influenced by the structure of the human brain to stimulate human behavior in 
processes. The learning process takes place in these networks through a set of simple processing, called 
artificial neurons [6, 7], ANN is an information-processing model which stimulates how the biological 
nervous system processes information [8-10]. ANN is assembled and composed in layers, where each layer 
has artificial neurons that are connected with each other. The basic elements of an ANN can be 
shown in figure 2. 


InnnK 



Figure 2. Basic elements of an ANN [6] 


Based on Figure 2, the data which is the input vector of the neuron (x 1: x 2 ,..., x n ) and the neurons 
of the input layer (wy, w 2; -,..., w n f) with their respective heights are observed. The additive junction, or 
known as sum, is represented by the letter sigma (E), whereas the activation function and output are denoted 
by <p and y, respectively. 

A work that used this approach was proposed in [6]. Their aim was to classify the websites with the 
phishing characteristics. Based on the results obtained, it showed that the ANN correctly classified 87.61% in 
the training of 1000 records obtained from Phishing Websites Data Set of the University of California’s 
Machine Learning and Intelligent Systems Learning Center. By comparing the other methods, such as 
Dynamic evolving neural network based on reinforcement learning, it had a slightly increased accuracy 
percentage where of 98.63%, which was a 0.40% difference in the accuracy percentage. This may be due to 
the order of the attributes used during the implementation. The study suggested that the order of the attributes 
should be changed to find the better groups to be processed by ANN. Besides that, [11] also proposed the 
ANN approach in detecting phishing in emails. The neural network models were tested with 17 and 12 
features. The neural network with 17 features was tested by using 8,801 vectors, 587 train vectors, and 8,214 
test vectors, whereas the neural network with 12 features was tested with 8697 vectors, 282 train vectors, and 
8,415 test vectors. Based on the results obtained, the accuracy obtained was the same with 4, 5, and 6 hidden 
neurons. As the number of neurons in the hidden layer increased, the time taken to train the neural also 
increased. Thus, the number of hidden nodes taken was 4, which showed the best result. The number of 
hidden neurons taken was 3 since it showed the best accuracy among all of them. The results also showed 
that as the number of nodes increased above 3 neurons, the accuracy dropped down to 50%. Next, in [12] 
proposed the phishing URL detection by using ANN with Particle Swarm Optimization (PSO). Their aim is 
to show that the ANN-PSO can achieve better accuracy compared to Back Propagation Neural Network 
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(BPNN). The result of this work shows that the NN-PSO model achieved better accuracy, where all the 
accuracy reading for three different learning ratios in ANN-PSO model is above 97%, whereas the highest 
accuracy reading for the BPNN is 96.81%. 

2.2. Decision tree algorithm 

The DT algorithm is an algorithm that belongs to the supervised classification algorithm. This 
algorithm is used in solving regression and classification problems by creating a training model which will 
predict the class or value of target variables summarized from the training data. There are two types of DT 
algorithm, which include Iterative Dichotomiser 3 (ID3) algorithm and C4.5 algorithm. The ID3 utilizes the 
process for creating a decision tree in the “top-down” form. It has been proven a very useful method, but still 
has a huge number of constraints, which will deter the application of this algorithm in many real-world 
situations [13]. The C4.5 algorithm was developed to overcome these problems and has been considered as a 
good solution when using a large size, missing, and continuous variables data. The DT algorithm can be 
expressed in the (1-5) as follows: 


Info(D ) = —T.™ 1 p i log 2 (p £ ) 

(i) 

lnfo A (D ) = Z/ =1 ^ x Info(Dj) 

(2) 

Splitlnf o a (D) = x log 2 

(3) 

Gain(A ) = Info(D) — Info A (D ) 

(4) 

GainRatio(j 4) = — Gam ^ A '>— 

Splitlnf o a(D) 

(5) 


where D is the training set of class-labeled tuple, Dj is denoted as a subset of D, and C, is the class label of 
tupple (for i = 1,2,..., m). Meanwhile, p, refers to the probability of a tuple in D and belongs to class Q and 
| D | is the number of tuples in D. 

In a work proposed by [14], they used C4.5 algorithm in WEKA for the detection of phishing 
websites. Their work used a dataset that contained 300 websites. Based on the results obtained by their work, 
it was found that 200 websites were detected as phishing websites. The success rate and error rate obtained 
were 0.826 and 0.173, respectively, after the prediction confusion matrix was generated. Thus, the accuracy 
of the classifier model that trained with 750 instances was 82.6%. Proposed a phishing detection system 
using the ID3 algorithm [15]. The objective of their system was to distinguish whether the URL was 
legitimate or a phishing URL. The proposed system had four main steps, which included step 1: data 
preparation; step 2: feature extraction of URL; step 3: implementation of ID algorithm where ID3 will 
perform the classification process; and finally step 4: a model of their method. Another work was performed 
using the ID3 algorithm by [16]. In their work, they used the ID3 algorithm as a classifier and the info gain 
feature selection technique was applied by using different top selected feature subsets for determining the 
phishing websites. They used data from UCI repository in conducting their experiments where the data 
consist of 30 features, 11055 instances, and one class which can be classified as phishing websites and 
legitimate websites. Lrom the results obtained, their method performed well compared to other 
classification methods. 


2.3. K-means clustering approach 

K-means clustering is the algorithm used to cluster data points into different clusters where the 
minimization of the distance between the elements of the cluster and its centroid is made [17], Basically, 
K-means algorithm is used for partitioning of the n observation into k clusters, in which each of the 
observation belongs to the cluster with the nearest mean [18], According to [19], the algorithm of this 
approach is iterated between two steps, which are the data assignment step and centroid update step. The 
algorithm is as follows: 


argmin c . £C dist(c i , x ) 2 

C[ = jijTj g Si %i 


(6) 

(7) 


where dist(-) is the standard Euclidean distance. Let the set of data point assignments for each i A th cluster 
centroid be S t . Both (6) and (7) in k-means clustering algorithm will be iterated until a stopping criterion 
is met. 
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A work that used this approach was conducted by [20], They proposed the Kernel K-means 
clustering, which is the extension of the k-means algorithm. The vectors were mapped from the vector space 
to a higher dimensional feature space through kernel function and then k-means was applied in the feature 
space. Based on the results obtained, it showed that the proposed method (K-means clustering approach) was 
better compared to the ensemble clustering algorithm. Based on the accuracy result, it showed that the 
accuracy increased by 13.25% when testing 500 phishing websites and 1.48% when testing 2,000 phishing 
websites. The work that used k-means clustering approach was also done by [21]. In their work, they 
proposed three other different approaches, which were multilayer perceptron (MLP), J48 decision tree, and 
Naive Bayes. The result showed that the prediction accuracy percentage for k-means clustering approach was 
higher compared to the other three approaches, which was 99%. The other characteristics, such as time taken 
of the production of the model, correctly classifying instances, and incorrectly classifying instances were also 
taken into consideration and compared. However, the production time of this approach took a longer time 
compared to MLP, but this approach showed the highest number of correctly classified instances and lowest 
number of incorrectly classified instances. Another work that use this approach was proposed by [17]. They 
used the relational K-mean clustering which deals with non-vector data. The study was done with different 
numbers of clusters: 5, 6, and 7 clusters. The result showed that the 5-cluster selection was better compared 
to the others, where both mean and standard deviation indicated an identical distribution of emails in 
different clusters. This showed that the accuracy level of classification of the email contents was higher when 
using the 5-cluster selection. 

2.4. Naive Bayes algorithm 

The NB algorithm, also known as the Bayesian classifier, is a group of classification algorithm 
based on the Bayes’ Theorem. It is based on a probabilistic classifier with strong independence assumptions 
between features. Naive Bayes algorithm will share a common principle, where every characteristic being 
classified is independent of its value among any other characteristics. The general equation for Bayes’ 
theorem can be expressed as follows [22]: 

P(x\Y) = p(y| * )p W (8) 

v 1 ' P(y) 

where Pipe) is the independent probability of x prior probability, P(Y) is the independent probability of Y, 
P(X\x) is the conditional probability of Y given h: likelihood, and P(Y\x) is the conditional probability of x 
given Y. 

In a work performed by [23], they used Naive Bayes classifier as a text classification method to 
filter the machine for spam emails of victims. Their method used tokens, which represented the words used in 
the spam and non-spam emails to calculate the probability of the emails; whether it was a spam or not. In her 
work used Naive Bayes as a classifier to classify a web page. In her work, the proposed method gained the 
information of a web page from extracting the web page features based on the URL, source, and images [24], 
Then, the ant colony optimization algorithm was applied to optimize the extraction process before Naive 
Bayes was used to determine whether the web page was legit or fake. In his work proposed an improved 
method for spam email classification by determining whether the email contained a phishing URL or not 
[25], In the proposed method, the intelligent water drop algorithm was utilized to construct the feature 
selection and then Naive Bayes classifier was used to classify the email as legit or contained a phishing URL. 

2.5. Random forest algorithm 

The RL is based on an ensemble of learning methods created for classification and regression task 
by [26]. RL is a method that works by a set of decision trees, where the input of data will be added at the top 
of the tree and it will traverse up to down the tree to the smaller subsets. In the classification of phishing 
attack detection, the process works by the prediction of decision trees. Lor every input, the RL will randomly 
choose a subset of features that will be used in classification process, where RL makes the chosen process 
becomes unbiased through guesstimate. By doing this, the RL will improve the predictive accuracy and 
control over-fitting. The RL has variables which include forest size T, the depth of forest D, and the node 
(subset) i. The RL is normally given as follows: 

h(v,0i ) £ {true, false] (9) 

where v — (x 1 ,x 2 ,... x n ) £ M n is the feature of vector. The 0, is denoted as the optimal parameter of the t 
node. According to [27], RL combines multiple classification process, where each process contributes to a 
single assignation of the most frequent class to the input vector (x) to the class prediction denoted as Cf j- 
and can be defined as follows: 
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C ®f = majority vote {C b (x)}f (10) 

A work that used RF for the classification of phishing detection has been proposed by [28]. They 
investigated the performance of RF in classifying the phishing detection where their aim was to improve the 
prediction accuracy and use only fewer numbers of features. In their method, they only used 8 out of 15 
features. The proposed method was tested on data that had 2,000 phishing and ham emails and the proposed 
method was able to achieve 99.7% accuracy. Another work that used RF was carried out in by [29]. They 
proposed an approach for spam filtering that had two stages, which were feature selection and email 
classification. In the first stage, they used an optimization method which was PSO based on the wrapper 
feature selection for the selection of features. The aim of the first stage was to reduce the number of features. 
In the second stage, they used RF to develop a filtering model by using the features selected in the first stage. 
They tested their proposed method on a dataset that had 9,346 ham and spam emails where every email had 
79 features. Their work has proven the effectiveness in classifying ham and spam emails that contained 
phishing attacks. Another work that used RF was performed by [30]. They proposed a method in classifying 
phishing attacks based on URL. In their method, they used the URL information, which is the metadata of 
URL that included the number of slashes and keyword in the URL portion. Then, they applied the RF as a 
classifier to determine whether the URL was a phishing attack or a legitimate URL. They tested their method 
on two URL datasets that had 2,500 data with 31 features and 1353. 

2.6. Support Vector Machine algorithm 

The SVM algorithm, which is also known as SVM classifier, is a machine algorithm which is mostly 
used in classification problems. It is also a supervised learning technique, whereby it will classify the dataset 
that contains class labels and features. According to [31], SVM algorithm is a linear strong classifier which 
can identify two label classes in the dataset. This algorithm will produce a set of hyperplanes, in which the 
maximum marginal hyperplane will be considered at the end of the test. The SVM algorithm can be 
expressed in (11) and (12); 


min i |w| 2 +c£f= ^ 

(11) 

yiiwxi -b)»l-z t h> 0 

(12) 


where i is a range of 1, 2,...,n, n refers to the dimensionality of the feature, x is the input vector, w is the 
normal vector to the hyperplane, C refers to the capacity constant, and & is the parameters for handling no 
separable data (inputs). 

Since phishing websites are usually attached to spam emails, this algorithm will be suitable to help 
detect them. There are some attributes that are commonly used by SVM algorithm to detect phishing 
websites as listed in Table 1 [31]. 


No. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 


Table 1. The attributes and their significance in phishing attack detection by SVM 


Attributes 

Internet Protocol (IP) address 
URL length 
Shortening service 
Having ‘@’ symbol 
Double slash redirecting 
Having sub domain 

URL of Anchor 

Links in tags 
Abnormal URL 
Age of domain 
Page rank 

Links pointing to page 


Significance 

The website is phishing if the IP address is used in the domain name. 

URL length that is more than 75 characters is considered as phishing websites. 

Shortened link could confuse the user. 

Websites that contain the ‘@* symbols are usually phishing websites. 

The website can be categorized as a phishing website if there is 71 ’ at the end of its address. 
Websites having more than two levels and more than three dots (domain within a domain) 
could be phishing websites. 

Phishing websites usually have different domains compared to legitimate websites, where the 
anchor tag is connected to the same domain as the source code. 

This will lead to some infected websites. 

Extracted from the database while the main identity of the legitimate websites is in the URL. 
Websites that are more than six months of age can be classified as phishing websites. 

Phishing websites have low page rank. 

Phishing websites usually have links pointing to zip files that contain malware, which will be 
downloaded automatically to the computer. 


A work that used SVM was performed by [32]. They presented a novel approach that used 
lightweight phishing detection with URL-based as their feature and applied SVM as the classifier. Their 
method was able to achieve 95.80% of classification accuracy when it was tested on 2,000 datasets that 
consisted 1,000 legitimate and 1,000 phishing URLs. Another work that utilized the effectiveness of SVM 
was carried out by [33], In their work, they developed a client phishing attack hybrid detection model. Their 
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method worked based on the web content feature and the IP address feature. They proposed a hybrid system 
that consisted of particle swarm optimization (PSO) and SVM. They used PSO to automatically optimize the 
selected parameters in their model and then SVM was used as a classifier to determine the phishing attack 
class. Used SVM in their work to classify phishing attacks [34]. They used kernel-based SVM method in 
their work by extracting features of webpages including textual properties, link structures, webpage contents, 
DNS information, and network traffic. Their method performed better in detecting phishing webpages where 
the proposed method was able to achieve the accuracy of around 95%. 


3. COMPARATIVE ANALYSIS OF MACHINE LEARNING IN PHISHING 
ATTACK DETECTION 

In choosing the suitable ML method for detecting phishing attacks, there are many factors that need 
to be considered. Normally, many factors will contribute to the performance and the accuracy of the method 
such as the processing speed, the accuracy of the classifier, the size and the complexity of data, the 
interpretability of the model produced by the ML method, and the easy implementation of the ML method to 
the specific problem. Table 2 lists out a comparative analysis of the machine learning methods. 


Method 


Artificial 

Neural 

Network 


Decision 

Tree 

Algorithm 


k-means 

clustering 


Naive 

Bayes 

Algorithm 


Random 

Forest 

Algorithm 


Table 2. Comparative analysis of machine learning methods 


Advantages 

- The ANN allows for specifying the attribute and the 
type of learning performed in this approach [6]. 

- The ANN fault tolerant which is has an ability to 
work with incomplete knowledge and data that has 
noise [35]. 

- ANN able to develop an accurate model by using 
experimental data only [36]. 

- The ANN has a distributed memory which is suitable 
works in parallel processing [37-39]. 


- Its simplicity to explain and interpret the feature 
relationships and interactions [43-45]. 

- The model produced by DT is easy to be interpreted 
and understood because it produces simple IF-THEN 
statements [46, 47]. 

- The DT is easy to be implemented compared to 
others [46]. 

- Requires less time in the classification process [43, 
47]. 

- The k-means clustering has an ability to prevent the 
drawback of linearly separable clusters in vector 
space [20]. 

- This approach is able to minimize clustering error in 
feature space [18]. 

- This method is easy to implement in the 
classification process and the process is fast 
[49, 50]. 

- Its simplicity and quick convergence. It is also an 
easy and straightforward method [45]. 

-Uses a small number of data to estimate the 
important features for classification process [47, 52, 
53]. 

- Only requires less time in classification process 
compared to other methods [53, 54]. 

- Has the ability to handle missing values by 
assimilating the overall opportunities of the missing 
values [44]. 

- The speed and efficiency of RF is good when applied 
on large datasets and high dimensional problems 
with multi-class output [29, 58]. 

- Performance is better and robust compared to others 
where RF has higher accuracy especially on 
nonlinear problem [27, 35, 43, 59]. 

- A common problem in classification is overfitting, 
but RF is able to overcome this problem because if 
there are enough trees in the forest, RF will not 
overfit the model [35, 60]. 


Disadvantages 

- The order of the data attributes may affect the results obtained 
during the classification process [6]. 

- The ANN learning will be very slow if a very low learning rate 
is used. Besides, a high learning rate will cause oscillations in 
training and block the convergence of the learning process 
which become slow [37]. 

- Difficult to transform/model the problem to the network in 
ANN [40]. 

- The result produce by ANN is difficult to understand because 
ANN do not give a clue about the structure of their models and 
hard to predict the model [41, 42]. 

- The DT does not support online learning and requires 
rebuilding the tree each time new samples exist; this will 
require a new process and rebuilding the tree requires more 
time [45]. 

- The classification result of DT is low compared to another ML 
methods [43]. 

- The DT becomes more complex as the number of features is 
increased [48]. 

- The DT is unable to deal with missing values [44]. 

- This approach is totally depending on the initial random 
assignments/search direction, which lead to poor result if the 
initialization is not proper define [49, 50]. 

- This approach is unable to classify if the website is phishing 
and considers it as ‘may be’-uncertain and missing value [18]. 

- This approach requires higher computational resources and 
memory because this approach need to use all the input variable 
in classification process to get the higher accuracy [51]. 

- The NB cannot learn about the interactions and relationships 
between the features in each sample, where it leads to the low 
accuracy [45]. 

- Needs a large amount of data to get higher accuracy [53, 55, 
56]. 

- The NB requires a large space to store data due to its instance- 
based nature where the NB stores all training samples in its 
process [56]. 

- The NB is not sensitive about the data because the NB is unable 
to show the relationship between the variables where the 
variables may totally depend on each other [57]. 

- The main limitation of RF is that a large number of trees can 
make the algorithm slow and ineffective for real-time 
predictions. A more accurate prediction requires more trees, 
which results in a slower model [61]. 

- RF is a predictive modeling tool and not a descriptive tool, thus 
makes no description of the relationships data, making it hard 
to interpret the model [62]. 

- RF is sensitive where a small change of the parameter value can 
lead to a significant change of model and result [63]. 

- The result produce by RF is not consistent because RF use 
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Method 

Advantages 

Disadvantages 


- RF is easy to implement and the result produce by 
RF is easy to be interpreated [35]. 

- The SVM is known for having higher accuracy in 
classification and its ability to classify data that is not 
linearly separable [45]. 

- Performs better compared to others when applied on 

„ high-dimensional data with minimum data [43, 65, 

Support ^ 

Vector 66] ' 

.... - Good in handling large attributes and large amount 

Algorithm 

- SVM is memory efficient and converge fast to find 
the optimum solution, because it uses a subset of 
training points in the decision process [67]. 

- It is a robust model to solve prediction problems 
since it maximizes margin [68]. 


random factor in bootstrapping, bagging and constructing tree, 
thus make difficult to prove the consistency of result produce 
by RF [64]. 

• SVM demands a convex combination of kernels, thus making it 
time-consuming in the classification process [43, 45]. 

Hard to interpret and difficult to understand the model produced 
by SVM [45, 69]. 

Hard to implement and handle the numerical variables in the 
classification problem [46]. 

The parameter in SVM is sensitive where it needs to be set 
correctly and will affect the classification accuracy if not set 
properly [44]. 

• SVM is a binary classifier and can be applied on binary 
classification problems. To apply on multi-class classification, 
SVM needs some modifications [51, 70]. 


3. CONCLUSION 

his paper presents an overview of phishing detection. Phishing can be considered as an illegal 
activity by hackers to steal sensitive information of Internet users such as login credentials or bank account 
information by redirecting Internet users to the illegitimate websites. Usually, the hacker sends an email to 
Internet users that contains malicious software or URL to the fake website. Phishing detection is essential 
because it can prevent the hacker to steal Internet users’ information. This paper discussed the four features 
that can be considered in the detection of phishing attacks, which included URL-based, domain-based, 
page-based, and content-based features. Phishing detection can be considered as a classification of security 
breach and one way to detect phishing attacks is by using the ML method. In the presented paper, six popular 
and widely used classification methods of ML have been presented and discussed in detail, where the 
discussion covered on mechanism of the methods, strengths, and weaknesses. The ML methods chosen are 
NB, SVM, DT, RF, k-means clustering, and ANN. Based on the discussion of the ML methods, it is hard to 
determine which method is the best one because each method has its own advantages and disadvantages. The 
selection of method depends on the problem and features selected because there is no single method that 
works best on every problem and can be applied on varities problem domain such as [71-91]. 
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